Squish: Near-Optimal Compression for Archival of Relational Datasets

机译：squish：关系数据集存档的近似最优压缩

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Relational datasets are being generated at an alarmingly rapid rate acrossorganizations and industries. Compressing these datasets could significantlyreduce storage and archival costs. Traditional compression algorithms, e.g.,gzip, are suboptimal for compressing relational datasets since they ignore thetable structure and relationships between attributes. We study compression algorithms that leverage the relational structure tocompress datasets to a much greater extent. We develop Squish, a system thatuses a combination of Bayesian Networks and Arithmetic Coding to capturemultiple kinds of dependencies among attributes and achieve near-entropycompression rate. Squish also supports user-defined attributes: users caninstantiate new data types by simply implementing five functions for a newclass interface. We prove the asymptotic optimality of our compressionalgorithm and conduct experiments to show the effectiveness of our system:Squish achieves a reduction of over 50\% in storage size relative to systemsdeveloped in prior work on a variety of real datasets.

机译：关系数据集正在以惊人的速度在整个组织和行业中生成。压缩这些数据集可以大大降低存储和归档成本。传统的压缩算法（例如gzip）对于压缩关系数据集次优，因为它们忽略了表格结构和属性之间的关系。我们研究了利用关系结构在更大程度上压缩数据集的压缩算法。我们开发了Squish，该系统结合使用贝叶斯网络和算术编码来捕获属性之间的多种依存关系，并实现接近熵的压缩率。 Squish还支持用户定义的属性：用户可以通过简单地为newclass接口实现五个函数来实例化新的数据类型。我们证明了压缩算法的渐近最优性，并进行了实验以证明我们的系统的有效性：相对于先前在各种真实数据集上开发的系统，Squish的存储量减少了50％以上。

著录项

作者
Gao, Yihan; Parameswaran, Aditya;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. FQC: A novel approach for efficient compression, archival, and dissemination of fastq datasets [J] . Dutta Anirban, Haque Mohammed Monzoorul, Bose Tungadri, Journal of Bioinformatics and Computational Biology . 2015,第3期

机译：FQC：一种有效压缩，存档和传播fastq数据集的新颖方法
2. Indexing Musical Sequences in Large Datasets Using Relational Databases [J] . Aleksey Charapko, Ching-Hua Chuan International journal of multimedia data engineering & management . 2015,第2期

机译：使用关系数据库索引大型数据集中的音乐序列
3. A Framework for Migrating Relational Datasets to NoSQL 1 [J] . Leonardo Rocha, Fernando Vale, Elder Cirilo, Procedia Computer Science . 2015,第1期

机译：一种将关系数据集迁移到NoSQL的框架
4. Fast Anomaly Detection in Dynamic Clinical Datasets Using Near-Optimal Hashing with Concentric Expansions [C] . Syed Zeeshan, Rubinfeld Ilan 10th IEEE International Conference on Data Mining Workshops . 2010

机译：动态临床数据集中使用同心扩展的近乎最佳散列的快速异常检测
5. Entity resolution for large relational datasets. [D] . Guo, Zhaochen. 2010

机译：大型关系数据集的实体解析。
6. Squish: Near-Optimal Compression for Archival of Relational Datasets [O] . Yihan Gao, Aditya Parameswaran -1

机译：Squish：关系数据集归档的近最佳压缩
7. The Sloan Digital Sky Survey Science Archive: Migrating a Multi-Terabyte Astronomical Archive from Object to Relational DBMS [O] . Thakar, A R, Szalay, A S, Kunszt, Peter Z, 2004

机译：斯隆数字天空测量科学档案馆：将多TB天文档案馆从对象迁移到关系DBMS
8. Unsupervised Group Discovery and Link Prediction in Relational Datasets: A Nonparametric Bayesian Approach [R] . Koutsourelakis, P. S. 2007

机译：关联数据集中的无监督群发现和链接预测：非参数贝叶斯方法

Squish: Near-Optimal Compression for Archival of Relational Datasets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅